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Abstract 

We consider algorithms for “smoothed online convex optimization” problems, a variant of the 
class of online convex optimization problems that is strongly related to metrical task systems. 
Prior literature on these problems has focused on two performance metrics: regret and the 
competitive ratio. There exist known algorithms with sublinear regret and known algorithms 
with constant competitive ratios; however, no known algorithm achieves both simultaneously. 
We show that this is due to a fundamental incompatibility between these two metrics - no 
algorithm (deterministic or randomized) can achieve sublinear regret and a constant competitive 
ratio, even in the case when the objective functions are linear. However, we also exhibit an 
algorithm that, for the important special case of one-dimensional decision spaces, provides 
sublinear regret while maintaining a competitive ratio that grows arbitrarily slowly. 


1 Introduction 

In an online convex optimization (OCO) problem, a learner interacts with an environment in a 
sequence of rounds. During each round t: (i) the learner must choose an action x* from a convex 
decision space F] (ii) the environment then reveals a convex cost function c*, and (hi) the learner 
experiences cost c*(xp. The goal is to design learning algorithms that minimize the cost experienced 
over a (long) horizon of T rounds. 

In this paper, we study a generalization of online convex optimization that we term smoothed 
online convex optimization (SOCO). The only change in SOCO compared to OCO is that the cost 
experienced by the learner each round is now c*(x*) -|- \\x^ — where || • || is a seminorm0 That 

is, the learner experiences a “smoothing cost” or “switching cost” associated with changing the 
action, in addition to the “operating cost” c(-). 

Many applications typically modeled using online convex optimization have, in reality, some 
cost associated with a change of action. For example, switching costs in the A:-armed bandit 
setting have received considerable attention m. Additionally, a strong motivation for studying 
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SOCO comes from the recent developments in dynamic capacity provisioning algorithms for data 
centers |18H2nt [22 ll24H26j . where the goal is to dynamically control the number and placement of 
active servers (x^) in order to minimize a combination of the delay and energy costs (captured by 
c*) and the switching costs involved in cycling servers into power saving modes and migrating data 
(||x* — x*“^||). Further, SOCO has applications even in contexts where there are no costs associated 
with switching actions. For example, if there is concept drift in a penalized estimation problem, it 
is natural to make use of a regularizer (switching cost) term in order to control the speed of the 
drift of the estimator. 

Two communities, two performance metrics. Though the precise formulation of SOCO 
does not appear to have been studied previously, there are two large bodies of literature on closely 
related problems: (i) the online convex optimization (OCO) literature within the machine learning 
community, e.g., |15ll27j . and (ii) the metrical task system (MTS) literature within the algorithms 
community, e.g., [81123] . We discuss these literatures in detail in Section [3l While there are several 
differences between the formulations in the two communities, a notable difference is that they focus 
on different performance metrics. 

In the OCO literature, the goal is typically to minimize the regret, which is the difference 
between the cost of the algorithm and the cost of the offline optimal static solution. In this context, 
a number of algorithms have been shown to provide sublinear regret (also called “no regret”). For 
example, online gradient descent can achieve 0(\/T')-regret [27|. Though such guarantees are 
proven only in the absence of switching costs, we show in Section 13.11 that the same regret bound 
also holds for SOCO. 

In the MTS literature, the goal is typically to minimize the competitive ratio, which is the 
maximum ratio between the cost of the algorithm and the cost of the offline optimal (dynamic) 
solution. In this setting, most results tend to be “negative,” e.g., when c* are arbitrary, for any 
metric space the competitive ratio of an MTS algorithm with states chosen from that space grows 
without bound as the number of states grows However, these results still yield competitive 

ratios that do not increase with the number of tasks, i.e., with time. Throughout, we neglect 
dependence of the competitive ratio on the number of states, and describe the competitive ratio as 
“constant” if it does not grow with time. Note also that positive results have emerged when the 
cost function and decision space are convex [20] . 

Interestingly, the focus on different performance metrics in the OCO and MTS communities 
has led the communities to develop quite different styles of algorithms. The differences between 
the algorithms is highlighted by the fact that all algorithms developed in the OCO community have 
poor competitive ratio and all algorithms developed in the MTS community have poor regret. 

However, it is natural to seek algorithms with both low regret and low competitive ratio. In 
learning theory, doing well for both corresponds to being able to learn both static and dynamic 
concepts well. In the design of a dynamic controller, low regret shows that the control is not more 
risky than static control, whereas low competitive ratio shows that the control is nearly as good as 
the best dynamic controller. 

The first to connect the two metrics were [3], who treated the special case where the switching 
costs are a fixed constant, instead of a norm. In this context, they showed how to translate bounds 
on regret to bounds on the competitive ratio, and vice versa. A recent breakthrough was made 
by [HI who used a primal-dual approach to develop an algorithm that performs well for the “a-unfair 
competitive ratio,” which is a hybrid of the competitive ratio and regret that provides comparison 
to the dynamic optimal when a = 1 and to the static optimal when a = oo (see Section [2|). Their 
algorithm was not shown to perform well simultaneously for regret and the competitive ratio, but 
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the result highlights the feasibility of unified approaches for algorithm design across competitive 
ratio and regret □ 

Summary of contributions. This paper explores the relationship between minimizing regret 
and minimizing the competitive ratio. To this end, we seek to answer the following question: “Can 
an algorithm simultaneously achieve both a constant competitive ratio and a sublinear regret?” 

To answer this question, we show that there is a fundamental incompatibility between regret and 
competitive ratio — no algorithm can maintain both sublinear regret and a constant competitive 
ratio (Theorems [21 [3l and|3|). This “incompatibility” does not stem from a pathological example: 
it holds even for the simple case when c* is linear and x* is scalar. Further, it holds for both 
deterministic and randomized algorithms and also when the a-unfair competitive ratio is considered. 

Though providing both sublinear regret and a constant competitive ratio is impossible, we show 
that it is possible to “nearly” achieve this goal. We present an algorithm, “Randomly Biased Greedy” 
(RBG), which achieves a competitive ratio of {1 + 7 ) while maintaining 0 (max{T/ 7 , 7 }) regret for 
7 > 1 on one-dimensional action spaces. If 7 can be chosen as a function of T, then this algorithm 
can balance between regret and the competitive ratio. For example, it can achieve sublinear regret 
while having an arbitrarily slowly growing competitive ratio, or it can achieve 0 (\/r) regret while 
maintaining an 0(\/r) competitive ratio. Note that, unlike the scheme of [9], this algorithm has 
a finite competitive ratio on continuous action spaces and provides a simultaneous guarantee on 
both regret and the competitive ratio. 


2 Problem Formulation 


An instance of smoothed online convex optimization (SOCO) consists of a convex decision/action 
space F C (M+)” and a sequence of cost functions {c^, c^,... }, where each c^ : F ^ M+. At each 
time t, a learner/algorithm chooses an action vector £ F and the environment chooses a cost 
function ch Define the a-penalized cost with lookahead i for the sequence ..., x*, c*, ... 


to be 


Cf{A,T) =E 


^c‘(x*+*) + a| 




— X 


t+i-1 1 


.t=l 


where x^,...,x^ are the decisions of algorithm A, the initial action is x* = 0 without loss of 
generality, the expectation is over randomness in the algorithm, and || • || is a seminorm on 'SF. We 
usually suppress the parameter T. 

In the OCO and MTS literatures the learners pay different special cases of this cost. In OCO 
the algorithm “plays first” giving a 0 -step lookahead and switching costs are ignored, yielding Cq. 
In MTS the environment plays first giving the algorithm 1-step lookahead (i = 1), and uses a = 1, 
yielding C/. Note that we sometimes omit the superscript when a = 1, and the subscript when 
i = 0 . 

One can relate the MTS and OCO costs by relating Cf to as done by [3] and [9]. The 

penalty due to not having lookahead is 


c*(x*) - c‘(x*+^) < Vc'(x*)(x* - x‘+^) < ||Vc*(x ‘)||2 • ||x* - x‘+i 2 , (1) 

^There is also work on achieving simultaneous guarantees with respect to the static and dynamic optima in other 
settings, e.g., decision making on lists and trees [5], and there have been applications of algorithmic approaches from 
machine learning to MTS mil]. 
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where || • II 2 is the Euclidean norm. We adopt the assumption, common in the OCO literature, that 
||Vc *(-)||2 are bounded on a given instance; which thus bounds the difference between the costs of 
MTS and OCO (with switching cost), Ci and Cq. 

Performance metrics. The performance of a SOCO algorithm is typically evaluated by com¬ 
paring its cost to that of an offline “optimal” solution, but the communities differ in their choice 
of benchmark, and whether to compare additively or multiplicatively. 

The OCO literature typically compares against the optimal offline static action, i.e., 

T 

OPTg = min^c^(x), 

t=i 

and evaluates the regret^ defined as the (additive) difference between the algorithm’s cost and that 
of the optimal static action vector. Specifically, the regret Ri{A) of Algorithm A with lookahead i 
on instances (f, is less than p{T) if for any sequence of cost functions (c^,..., c^) G 

C^^{A)-OPTs<p{T). (2) 

Note that for any problem and any i>l there exists an algorithm A for which Ri{A) is non-positive; 
however, an algorithm that is not designed specifically to minimize regret may have Ri{A) > 0. 

This traditional definition of regret omits switching costs and lookahead (i.e., Rq{A)). One can 
generalize regret to define R'^iA), by replacing C^{A) with Cl{A) in Equation ([2]). Specifically, 
R'i{A) is less than p{T) if for any sequence of cost functions (c^,... ,c^) G 

C}{A) - OPTs < p{T). 

Except where noted, we use the set of sequences of convex functions mapping (M+)"' to with 
(sub)gradient uniformly bounded over the sequence. Note that we do not require differentiability; 
throughout this paper, references to gradients can be read as references to subgradients. 

The MTS literature typically compares against the optimal offline (dynamic) solution, 

T 

OPTfi = min c*(x^) -|- llx* — 

and evaluates the competitive ratio. The cost most commonly considered is Ci. More generally, 
we say the competitive ratio with lookahead i, denoted by CRi{A), is p{T) if for any sequence of 
cost functions (c^,..., c^) G Gl^, 

QiA)<p{T)OPTa + 0{l). (3) 

Bridging competitiveness and regret. Many hybrid benchmarks have been proposed to bridge 
static and dynamic comparisons. For example, Adaptive-Regret |16] is the maximum regret over 
any interval, where the “static” optimum can differ for different intervals, and internal regret [7] 
compares the online policy against a simple perturbation of that policy. We adopt the static- 
dynamic hybrid proposed in the most closely related literature mm, the a-unfair competitive 
ratio, which we denote by CRf{A) for lookahead i. For a > 1, CRf{A) is p{T) if Equation ([3|l 
holds with OPTd replaced by 

T 

OPT? = min c*(x*) -|- allx* — 
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Specifically, the a-unfair competitive ratio with lookahead i, CRf{A), is p{T) if for any sequence 
of cost functions (c^,..., c^) G 


aiA)<p{T)OPT2 + 0{l). 

For a = 1, OPT2 is the dynamic optimum; as a ^ oo, OPT^ approaches the static optimum. 

To bridge the additive versus multiplicative comparisons used in the two literatures, we define 
the competitive difference. The a-unfair competitive difference with lookahead i on instances 
C, CDf{A), is p{T) if for any sequence of cost functions (c^,... ,c^) G 

Ci{A)-OPT2<p{T). 

This measure allows for a smooth transition between regret (when a is large enough) and an additive 
version of the competitive ratio when a = 1. 

3 Background 

In the following, we briefly discuss related work on both online convex optimization and metrical 
task systems, to provide context for the results in the current paper. 

3.1 Online Convex Optimization 

The OCO problem has a rich history and a wide range of important applications. In computer 
science, OCO is perhaps most associated with the /c-experts problem [13121], a discrete-action 
version of online optimization wherein at each round t the learning algorithm must choose a number 
between 1 and k, which can be viewed as following the advice of one of k “experts.” However, OCO 
also has a long history in other areas, such as portfolio management [iniiii. 

The identifying features of the OCO formulation are that (i) the typical performance metric 
considered is regret, (ii) switching costs are not considered, and (iii) the learner must act before 
the environment reveals the cost function. In our notation, the cost function in the OCO literature 
is C^{A) and the performance metric is Ro{A). Following [27j and m, the typical assumptions 
are that the decision space F is non-empty, bounded and closed, and that the Euclidean norms of 
gradients ||Vc *(-)||2 are also bounded. 

In this setting, a number of algorithms have been shown to achieve “no regret,” i.e., sublin- 
ear regret, Ro{A) = o(T). An important example is online gradient descent (OGD), which is 
parameterized by learning rates rjt- OGD works as follows. 

Algorithm 1 (Online Gradient Descent, OGD). Select an arbitrary G F . At time step t > 1, 
select = P(x* — r]t^c^(x^)), where P{y) = argmina;^^ ||x — y \\2 is the projection under the 
Euclidean norm. 

With appropriate learning rates pt, OGD achieves sublinear regret for OCO. In particular, 
the variant of m uses pt = 0(l/\/i) and obtains 0(\/T)-regret. Tighter bounds are possible 
in restricted settings. The work of m achieved O(logT) regret by choosing pt = l/i'jt) for 
settings when the cost function additionally is twice differentiable and has minimal curvature, i.e., 
V^c*(x) — pin is positive semidefinite for all x and t, where R is the identity matrix of size n. In 
addition to algorithms based on gradient descent, more recent algorithms such as Online Newton 
Step and Follow the Approximate Leader [15] also attain 0(logT)-regret bounds for a class of cost 
functions. 
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None of the work discussed above considers switching costs. To extend the literature discussed 
above from OCO to SOCO, we need to track the switching costs incurred by the algorithms. This 
leads to the following straightforward result, proven in Appendix lAl 

Proposition 1. Consider an online gradient descent algorithm A on a finite dimensional space 
with learning rates such that J2t=iVt = 0{pi{T)). If Rq{A) = 0{p2{T)), then we have R'q{A) = 
0{pi{T) + p2{T)). 

Interestingly, the choices of pt used by the algorithms designed for OCO also turn out to be 
good choices to control the switching costs of the algorithms. The algorithms of m and m. 
which use pt = l/V^ and pt = 1/(^1), are still 0(\/T)-regret and 0(log T)-regret respectively when 
switching costs are considered, since in these cases pi(T) = 0{p2{T)). Note that a similar result 
can be obtained for Online Newton Step |15] . 

Importantly, though the regret of OGD algorithms is sublinear, it can easily be shown that the 
competitive ratio of these algorithms is unbounded. 

3.2 Metrical Task Systems 

Like OCO, MTS also has a rich history and a wide range of important applications. Historically, 
MTS is perhaps most associated with the fe-server problem m- In this problem, there are k servers, 
each in some state, and a sequence of requests is incrementally revealed. To serve a request, the 
system must move one of the servers to the state necessary to serve the request, which incurs a 
cost that depends on the source and destination states. 

The formulation of SOCO in Section [2] is actually, in many ways, a special case of the most 
general MTS formulation. In general, the MTS formulation differs in that (i) the cost functions 
c* are not assumed to be convex, (ii) the decision space is typically assumed to be discrete and 
is not necessarily embedded in a vector space, and (iii) the switching cost is an arbitrary metric 
d{x^, x^~^) rather than a seminorm \\x^ — In this context, the cost function studied by MTS 

is typically Ci and the performance metric of interest is the competitive ratio, specifically CRi{A), 
although the a-unfair competitive ratio CRf also receives attention. 

The weakening of the assumptions on the cost functions, and the fact that the competitive ratio 
uses the dynamic optimum as the benchmark, means that most of the results in the MTS setting 
are “negative” when compared with those for OCO. In particular, it has been proven that, given an 
arbitrary metric decision space of size n, any deterministic algorithm must be H(n)-competitive [8]. 
Further, any randomized algorithm must be H(-\/log n/ log log n)-competitive [6]. 

These results motivate imposing additional structure on the cost functions to attain positive 
results. For example, it is commonly assumed that the metric is the uniform metric, in which 
d{x, y) is equal for all x / y; this assumption was made by [3] in a study of the tradeoff between 
competitive ratio and regret. For comparison with OCO, an alternative natural restriction is to 
impose convexity assumptions on the cost function and the decision space, as done in this paper. 

Upon restricting (fi to be convex, F to be convex, and || • || to be a semi-norm, the MTS 
formulation becomes quite similar to the SOCO formulation. This restricted class has been the 
focus of a number of recent papers, and some positive results have emerged. For example, m 
showed that when T is a one-dimensional normed spaceH a deterministic online algorithm called 
Lazy Capacity Provisioning (LCP) is 3-competitive. 

Importantly, though the algorithms described above provide constant competitive ratios, in all 
cases it is easy to see that the regret of these algorithms is linear. 

®We need only consider the absolute value norm, since for every seminorm || • || on R, ||a;|| = ||l|||a;|. 
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4 The Incompatibility of Regret and the Competitive Ratio 


As noted in the introduction, there is considerable motivation to perform well for regret and com¬ 
petitive ratio simultaneously, see also [SlEllTlEine]. None of the algorithms discussed so far achieves 
this goal. For example, Online Gradient Descent has sublinear regret but its competitive ratio is 
infinite. Similarly, Lazy Capacity Provisioning is 3-competitive but has linear regret. 

This is no accident. We show below that the two goals are fundamentally incompatible: any 
algorithm that has sublinear regret for OCO necessarily has an infinite competitive ratio for MTS; 
and any algorithm that has a constant competitive ratio for MTS necessarily has at least linear 
regret for OCO. Further, our results give lower bounds on the simultaneous guarantees that are 
possible. 

In discussing this “incompatibility,” there are a number of subtleties as a result of the differences 
in formulation between the OCO literature, where regret is the focus, and the MTS literature, where 
competitive ratio is the focus. In particular, there are four key differences which are important 
to highlight: (i) OCO uses lookahead i = 0 while MTS uses i = 1; (ii) OCO does not consider 
switching costs (a = 0) while MTS does (a = 1); (iii) regret uses an additive comparison while the 
competitive ratio uses a multiplicative comparison; and (iv) regret compares to the static optimal 
while competitive ratio compares to the dynamic optimal. Note that the hrst two are intrinsic to 
the costs, while the latter are intrinsic to the performance metric. The following teases apart which 
of these differences create incompatibility and which do not. In particular, we prove that (i) and 
(iv) each create incompatibilities. 

Our first result in this section states that there is an incompatibility between regret in the OCO 
setting and the competitive ratio in the MTS setting, i.e., between the two most commonly studied 
measures Ro{A) and CRi{A). Naturally, the incompatibility remains if switching costs are added 
to regret, i.e., Aq(A) is considered. Further, the incompatibility remains when the competitive 
difference is considered, and so both the comparison with the static optimal and the dynamic 
optimal are additive. In fact, the incompatibility remains even when the a-unfair competitive 
ratio/difference is considered. Perhaps most surprisingly, the incompatibility remains when there 
is lookahead, i.e., when Ci and Cj+i are considered. 

Theorem 2. Consider an arbitrary seminorm || • || on M”, constants 7 > 0, a > 1, and i G N. 
There is a containing a single sequence of cost functions such that, for all deterministic and 
randomized algorithms A, either Ri{A) = D(T) or, for large enough T, both CR^^(A) > 7 and 
CDt^iiA) > 7T. 

The incompatibility arises even in “simple” instances; the proof of Theorem [2] uses linear cost 
functions and a one-dimensional decision space, and the construction of the cost functions does not 
depend on T or A. 

The cost functions used by regret and the competitive ratio in Theorem [2] are “off by one,” 
motivated by the different settings in OCO and MTS. However, the following shows that parallel 
results also hold when the cost functions are not “off by one,” i.e., for Rn{A) versus CRn(A) and 
R\{A) versus CRf{A). 

Theorem 3. Consider an arbitrary seminorm || • || on M”', constants 7 > 0 and a > 1, and a 
deterministic or randomized online algorithm A. There is a C containing two cost functions such 
that either Ro{A) = H(T) or, for large enough T, both CRq{A) > 7 and CDq(A) > jT. 

Theorem 4. Consider an arbitrary norm || • || on M”'. There is a C containing two cost functions 
such that, for any constants 7 > 0 and a > 1 and any deterministic or randomized online algorithm 
A, either R[{A) = H(r) or, for large enough T, CRf{A) > 7 . 


7 



The impact of these results can be stark. It is impossible for an algorithm to learn static 
concepts with sublinear regret in the OCO setting, while having a constant competitive ratio for 
learning dynamic concepts in the MTS setting. More strikingly, in control theory, any dynamic 
controller that has a constant competitive ratio must have at least linear regret, and so there are 
cases where it does much worse than the best static controller. Thus, one cannot simultaneously 
guarantee the dynamic policy is always as good as the best static policy and is nearly as good as 
the optimal dynamic policy. 

Theorem [4] is perhaps the most interesting of these results. Theorem [2] is due to seeking to 
minimize different cost functions (c* and while Theorem [3] is due to the hardness of attaining 

a small CRq, i.e., of mimicking the dynamic optimum without 1-step lookahead. In contrast, for 
Theorem m algorithms exist with strong performance guarantees for each measure individually, and 
the measures are aligned in time. However, Theorem 0] must consider the (nonstandard) notion of 
regret that includes switching costs {R'), since otherwise the problem is trivial. 

4.1 Proofs 

We now prove the results above. We use one-dimensional examples; however, these examples can 
easily be embedded into higher dimensions if desired. We show proofs only for competitive ratio; 
the proofs for competitive difference are similar. 

Let a = max(l, ||q!||). Given a > 0 and b > 0, define two possible cost functions on F = [0,1/d]: 
/f (x) = b + axa and /^(x) = b + a(l — xd). These functions are similar to those used by [H] to 
study online gradient descent to learn a concept of bounded total variation. To simplify notation, 
let D{t) = 1/2 — E [x*] d, and note that D{t) E [—1/2,1/2]. 

4.1.1 Proof of Theorem [2] 

To prove Theorem [JJ we prove the stronger claim that CRf^i{A) + Ri{A)/T > 7 . 

Consider a system with costs R = /f if t is odd and if t is even. Then Ci{A) > (a/2 -|- b)T + 
aYlt=i{~^Y + i)- The static optimum is not worse than the scheme that sets x* = l/( 2 d) for 
all t, which has total cost no more than (a/2 -|- b)T + ||1/2||. The a-unfair dynamic optimum for 
Cj+i is not worse than the scheme that sets x* = 0 if t is odd and x* = 1 /d if t is even, which has 
total a-unfair cost at most {b + 1)T. Hence 

T 

Rr{A) >a^{-lYD{t + i)-\\l/2\\, 

t=i 

rijya ( + a YYt=ii~^y + » + 1 ) 

{b+l)T 


(6 + l)||l/2||-(a/2 + 6)r 
T 

a 'y ] (—l)*(jl(t -\- i 1) -l- {b l)D[t i)) 

t=i 

T 

ab^{-lYD{t + i)-a {D{i + 1) + {-lfD{T + i + 1)) 

t=i 

—abT/2 — a. 
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1+1 

Thus, since D{t) E [—1/2,1/2], 

{b + l)T{CRt,,{A) + Ri{A)m + 

> 

> 



To establish the claim, it is then sufficient that (a/2 + b)T — (6 + 1)||1/2|| — abT/2 — a > 'yT{b + 1). 
For b = 1/2 and a = 3 O 7 + 2 + ||6||, this holds for T > 5. 

4.1.2 Proof of Theorem [3] 

To prove Theorem[3l we again prove the stronger claim CRq{A) + Rq{A)/T > 7. 

Consider the cost function sequence with c*(-) = for E [a:*] < 1/2 and c*(-) = otherwise, 
on decision space [0,1], where x* is the (random) choice of the algorithm at round t. Here the 
expectation is taken over the marginal distribution of x* conditioned on ci,..., ct-i, averaging out 
the dependence on the realizations of xi,... Notice that this sequence can be constructed 

by an oblivious adversary before the execution of the algorithm. 

The following lemma is proven in Appendix [Bj 

Lemma 5. Given any algorithm, the sequenee of eost funetions chosen by the above oblivious 
adversary gives the following: 


RoiA),R'^iA) > a [1/2 - E [x*] | - ||1/2||, 

i=l 

(a/2 + b)T + aEt=i [1/2 - E [x*] 


CR^{A) > 


{b+\\a\\)T 


(4) 

(5) 


From Equation ([4]) and Equation ([5]) in LemmaEl we have CRq{A)+Ro{A)/T > — 

Eor a > 27(6 + llajl), the right hand side is bigger than 7 for sufficiently large T, which establishes 
the claim. 


4.1.3 Proof of Theorem 3] 

Let a = ||l||/2 and 6 = 0. Let M = 4 a 7 ||l||/a = 80 : 7 . Eor T M, divide [l,r] into segments of 
length 3M. Eor the last 2M of each segment, set R = /f. This ensures that the static optimal 
solution is X = 0. Moreover, if c* is either /“ or ftf for all t in the hrst M time steps, then the 
optimal dynamic solution is also x* = 0 for the last 2M time steps. 

Consider a solution for which each segment has non-negative regret. Then to obtain sublinear 
regret, for any positive threshold e, at least T/(3M) — o{T) of these segments must have regret 
below e||l/Q;||. We then show that these segments must have high competitive ratio. To make this 
more formal, consider (without loss of generality) the single segment [1,3M]. 

Let c be such that = f 2 for all t G [1, M] and c? = /“ for t > M. Then the optimal dynamic 
solution on [1,3M] is x^ = lt<M/d, which has total cost 2a||l/Q;|| consisting entirely of switching 
costs. 

The following lemma is proven in Appendix O 

Lemma 6. For any 5 G (0,1/d) and integer r > 0, there exists an e{5,T) > 0 such that, if 
c* = f 2 for all 1 < t < T and x^ > 5 for any 1 < t < t, then there exists an m < t such that 
C'i(x,m) — Ci{OPTs,m) > e( 6 ,r)||l/d||. 

Let 5 = l/[5d] G (0,1). Eor any decisions such that x* < <5 for all t G the operating cost 

of X under c is at least 3 a 7 ||l/d||. Let the adversary choose a c on this segment such that c* = f^ 
until (a) the hrst time to < M that the algorithm’s solution x satishes C'i(x, to) ~ Ci{OPTs,to) > 
e( 6 , M)||l/d||, or (b) t = M. After this, it chooses c* = /f. 
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In case (a), Ci(x,3M) — Ci{OPTs,3M) > e{6, M)\\l/a\\ by Lemma [6l since OPTg incurs no 
cost after to- Moreover C'i(x,3M) > Ci{OPTd,SM). 

In case (b), Ci{x,3M)/Ci{OPTd,3M) > 3a7||l/a||/(2a||l/a||) = 37 / 2 . 

To complete the argument, consider all segments. Let g{T) be the number of segments for 
which case (a) occurs. The regret then satisfies 

R[iA)>ei6,M)\\l/a\\g{T). 

Similarly, the ratio of the total cost to that of the optimum is at least 

Ci{x,T) ^ [T/{3M) - g{T)]3a^\\l/a\\ _ 3 / 3Mg{T) \ 

CiiOPTd,T)- [r/(3M)]2a||l/a|| 2^ V T )' 

If g{T) = n(T), then R'i{A) = Q{T). Conversely, if g{T) = o{T), then for sufficiently large T, 
3Mg{T)/T <1/3 and so CR"(^) > 7 . 

5 Balancing Regret and the Competitive Ratio 

Given the above incompatibility, it is necessary to reevaluate the goals for algorithm design. In 
particular, it is natural now to seek tradeoffs such as being able to obtain eT regret for arbitrarily 
small e while remaining 0 (l)-competitive, or being loglogT-competitive while retaining sublinear 
regret. 

To this end, in the following we present a novel algorithm. Randomly Biased Greedy (RBG), 
which can achieve simultaneous bounds on regret Rq and competitive ratio CRi, when the decision 
space F is one-dimensional. The one-dimensional setting is the natural starting point for seeking 
such a tradeoff given that the proofs of the incompatibility results all focus on one-dimensional 
examples and that the one-dimensional case has recently been of practical significance, e.g. m- 
The algorithm takes a norm N as its input: 

Algorithm 2 (Randomly Biased Greedy, RBG(A)). 

Given a norm N, define w^{x) = N{x) for all x and w^{x) = minj^{r(;*“^(y) -|- c^{y) + N{x — y)}. 
Generate a random number r uniformly in (—1,1). For each time step t, go to the state x* which 
minimizes Y^{x^) = w^~^{x^) + rN{xf). 

RBG is motivated by m, and makes very limited use of randomness ~ it parameterizes its 
“bias” using a single random r E (—1,1). It then chooses actions to greedily minimize its “work 
function” w^{x). 

As stated, RBG performs well for the a-unfair competitive ratio, but performs poorly for the 
regret. Theorem [7] shows that RBG(|| • ||) is 2-comp etitivelf] and hence has at best linear regret. 
However, the key idea behind balancing regret and competitive ratio is to run RBG with a “larger” 
norm to encourage its actions to change less. This can make the coefficient of regret arbitrarily 
small, at the expense of a larger (but still constant) competitive ratio. 

Theorem 7. For a SOCO problem in a one-dimensional normed space || • ||, running RBG(N) 
with a one-dimensional norm having N{1) = 0||1|| as input (where 9 > 1) attains an a-unfair 
eompetitive ratio CRf of {1 -\-9)/ min{0,Q!} and a regret Rq of 0{max{T/9,9}). 

^Note that this improves the best known competitive ratio for this setting from 3 (achieved by Lazy Capacity 
Provisioning) to 2. 
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Note that Theorem [7] holds for the usual metrics of MTS and OCO, which are the “most 
incompatible” case since the cost functions are mismatched (cf. Theorem [2]). Thus, the conclusion 
of Theorem [7] still holds when Rq or Ri is considered in place of Rq . 

The best CRf, 1 + l/a, achieved by RBG is obtained with iV(-) = q:|| ■ ||. However, choosing 
^i') — II ■ 11/^ arbitrarily small e gives eT-regret at the cost of a larger CRf. Similarly, if T is 
known in advance, choosing A^(l) = 0{T) for some increasing function achieves an 0{9{T)) a-unfair 
competitive ratio and 0{max{T/9{T),9{T)}) regret; taking 9{T) = 0{y/T) gives 0{y/T) regret, 
which is optimal for arbitrary convex costs m- If T is not known in advance, A^(l) can increase 
in t, and bounds similar to those in Theorem [7] still hold. 

Proof of Theorem 0 

To prove Theorem [71 we derive a more general tool for designing algorithms that simultaneously 
balance regret and the a-unfair competitive ratio. In particular, for any algorithm A, let the 
operating cost be OC{A) = the switching cost be SC{A) = — x*||, 

so that Ci{A) = OC{A) + SC (A). Define OPTj\[ to be the dynamic optimal solution under the 
norm A^(l) = 0||1|| {9 > 1) with a = 1. The following lemma is proven in Appendix iDl 

Lemma 8. Consider a one-dimensional SOCO problem with norm || • || and an online algorithm 
A which, when run with norm N, satisfies OC{A{N)) < OPTjsf + 0(1) along with SC{A{N)) < 
fiOPT^ + 0(1) with fi = 0(1). Fix a norm N such that A^(l) = 0||1|| with 9 > 1. Then A{N) has 
a-unfair competitive ratio CRf{A{N)) = (1+/3) max{^, 1} and regret Rq{A{N)) = 0(max{/?T, (1-|- 
fi)9}) for the original SOCO problem with norm || • ||. 

Theorem [7] then follows from the following lemmas, proven in Appendices lEl and iFl 

Lemma 9. Given a SOCO problem with norm || • ||, E [OC{RBG{N))] < OPT^q. 

Lemma 10. Given a one-dimensional SOCO problem with norm || • ||, 

E [S'0(i?H0(iV))] < OPT]\f/9 with probability 1. 

6 Concluding Remarks 

This paper studies the relationship between regret and competitive ratio when applied to the class 
of SOCO problems. It shows that these metrics, from the learning and algorithms communities 
respectively, are fundamentally incompatible, in the sense that algorithms with sublinear regret 
must have infinite competitive ratio, and those with constant competitive ratio have at least linear 
regret. Thus, the choice of performance measure significantly affects the style of algorithm design. It 
also introduces a generic approach for balancing these competing metrics, exemplified by a specific 
algorithm, RBG. 

There are a number of interesting directions that this work motivates. In particular, the SOCO 
formulation is still under-explored, and many variations of the formulation discussed here are still 
not understood. For example, is it possible to tradeoff between regret and the competitive ratio 
in bandit versions of SOCO? More generally, the message from this paper is that regret and the 
competitive ratio are incompatible within the formulation of SOCO. It is quite interesting to try to 
understand how generally this holds. For example, does the “incompatibility result” proven here 
extend to settings where the cost functions are random instead of adversarial, e.g., variations of 
SOCO such as fc-armed bandit problems with switching costs? 
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A Proof of Proposition [T] 

Recall that, by assumption, ||Vc *(-)||2 is bounded. So, let us define D such that ||Vc *(-)||2 < D. 
Next, due to the fact that all norms are equivalent in a finite dimensional space, there exist m, M > 0 
such that for every x, m||x||a < ||x||6 < M||x||a. Combining these facts, we can bound the switching 
cost incurred by an OGD algorithm as follows: 


E 

t=i 






t=l 

T 


< 


M^r/t||Vc* 


:)\\2 


t=i 




t=l 


The second inequality comes from the fact that projection to a convex set under the Euclidean norm 
is nonexpansive, i.e., \\P{x) — T’(y )||2 < H^: — y\\ 2 - Thus, the switching cost causes an additional 
regret of Ylt=i Vt = 0{pi{T)) for the algorithm, completing the proof. 


B Proof of Lemma [5] 


Recall that the oblivious adversary chooses c*(-) = for E [x*] <1/2 and c*(-) = /{* otherwise, 
where x* is the (random) choice of the algorithm at round t. Therefore, 

c„(.4)>E[fif*:Ms 1/2 

1 ax' + b otherwise 

T 

= E IftT + a^ (1/2 + (1/2 -x')sgn(l/2 -E [x'])) 
t=i 


6r + a ^ (1/2 + (1/2 - E [x*] )sgn(l/2 - E [x‘])) 

t=i 

T 

{a/2 + h)T + a Eli/2-*iMI. 


t=l 


where sgn(x) = 1 if x > 0 and —1 otherwise. The static optimum is not worse than the scheme 
that sets x' = 1/2 for all t, which has total cost {a/2 + h)T + ||1/2||. This establishes Equation ()l|l. 

The dynamic scheme which chooses x'^^ = 0 if c' = /)* and x'+^ = 1 if c' = /^ has total 
a-unfair cost not more than (6 + ||q:||)T. This establishes Equation ([5]). 

C Proof of Lemma [6] 

Proof. We only consider the case that d = 1; other cases are analogous. We prove the contrapositive 
(that if C'i(x;m) — Ci{OPTs,m) < e||l|| for all m, then x* < 5 for all t G [1,t]). We consider the 
case that x' are non-decreasing; if not, the switching and operating cost can both be reduced by 
setting (x*)' = maxt'<tx' . 
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Note that OPTg sets x* = 0 for all t, which implies Ci{OPTs,m) = am, and that 

m 

C'i(x; m) = x™||l|| — a x* + am. 

i=l 

Thus, we want to show that if x™'||l|| — 2 :* < e for all m <t, then x* < h for all t G [1,t]. 

Define /*(•) inductively by fi{y) = 1/(1 - y), and 

1 / 

If y < 1, then {/j(y)} are increasing in i. Notice that {/*} satisfy 



m—1 

/m(y)(i -y)-y^ fi{y) = i- 

2=1 


Expanding the first term gives that for any e, 

m 

an.(o/iiiii)-T^Ey*(“/ii'ii) = '- 

2=1 


If for some e > 0, 

m 

x”^-^^x^'<e (6) 

i=i 

for all m < T, then by induction x* < e/i(a/||l||) < e/T-(o/||l||) for all i < t, where the last 
inequality uses the fact that a < ||1|| and hence {/i(a/||l||)} are increasing in i. 

Observe that the left hand side of Equation Q is (C'i(x;m) — C'i(OPTs,m))/||l||. Define 
e = e = 6/{2fr{a/\\l\\)). Assuming we have (C'i(x;m) — C'i(OPTs,m)) < e||l|| for all m, then 
Equation (j^ holds for all m, and thus x* < e/T-(o/||l||) = 6/2 < 6 for all t G [l,r]. □ 

D Proof of Lemma [8] 

We first prove the a-unfair competitive ratio result. Let x^,x^,... ,x^ denote the actions chosen 
by algorithm ALG when running on a normed space with = || • \\alg as input. Let y^,y‘^, ■ ■ ■ 
be the actions chosen by the optimal dynamic offline algorithm, which pays a times more for 
switching costs, on a normed space with || • || (i.e., OPT///). Similarly, let be the 

actions chosen by the optimal solution on a normed space with || • HalG) namely OPr||.||^^^ (without 
an unfairness cost). Recall that we have Ci{ALG) = + ||x*'''^ — x*||, OPT^ = 

Er=iC*(y*) + a||y* -y^~^\\, and + P* - alg- By the assumptions 

in our lemma, we know that Ci{ALG) < (1 + /3)OPT||.m^^ + 0(1). Moreover, 

OPT// = 

> 


^c*(y*)+ a||y*-p 


t=i 

T 


+ ills' - y‘T\ALG > 

y maxjl,^} 


t=i 
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The first inequality holds since || • \\alg = ^|| • II with 9 > 1. Therefore, Ci{ALG) < (1 + 
/3)max{l,f}OPr,T 

We now prove the regret bound. Let dmax denote the diameter of the decision space (i.e., 
the length of the interval). Recall that Cq{ALG) = + \\x^ — 2^*~^|| and OPTg = 

min^ Ylt=i (x). Then we know that Co(ALG) < Gi{ALG)+DY^J^-^ ||x*'''^ — a;*|| + ||dmax|| for some 
constant D by Equation ([1]). Based on our assumptions, we have ^(1) 

and Ylt \\x^~^^ — x^W < /50Pr||.||^^g + 0(1). For convenience, we let E = O + 1 = 0(1). Then 
Gq{ALG) is at most: 

T 

C*(x*^^) + E||x*'’'^ ~ X^W + IMmaxll + 0(1) < (1 + E/3)OPT||.m^^ + lldmaxll + 0(1) 

t=l 

< (1 + EP){OPTs + ||dmax|Ul,G) + ||dmax|| + 0(1). 

Therefore, we get that the regret Go{ALG) — OPTg is at most 

E(30PTs + ||dmax||(l + E(1 + 13)9) + 0(1) = 0{f30PT, + (1 + f3)9) = 0(max{/30PT„ (1 + /3)9}). 

In the OCO setting, the cost functions c*(x) are bounded from below by 0 and are typically 
bounded from above by a value independent of T (e.g., [I71[2T]), so that OPTg = 0{T). This 
immediately gives the result that the regret is at most 0(max{/3T, (1 + /3)9}). 


E Proof of Lemma [9] 


In this section, we argue that the expected operating cost of RBG (when evaluated under || • |l) 
with input norm A^(-) = 0|| • ||, 0 > 1, is at most the cost of the optimal dynamic offline algorithm 
under norm N (i.e., OPTm)- Let M denote our decision space. Before proving this result, let us 
introduce a useful lemma. Let denote the actions chosen by RBG (similarly, let 

Xqpt, ..., denote the actions chosen by OPT^). 

Lemma 11. + c^{x^~^^). 

Proof. We know that for any state x G M, we have w^{x) = mi-n.y^M{w^~^{v) + c^iu) + d\\x — y||}. 
Suppose instead w^{x^~^^) = w^~^{y) + c*(y) + 9\\x^~^^ — y\\ for some y ^ x^~^^. Then 

yt+i(^t+i) ^ + 6»r||x‘+^|| 

= Vif~^{y) + c^{y) + 0||x‘+^ - y\\ + 0r||x^+^|| 

> w^~^{y) + c\y) + 9r\\y\\ 

= Y^+\y), 

which contradicts x^~^^ = argmin^g^ E*+^(y). Therefore = w^~^{x^~^^) + c*(x*^^). □ 


Now let us prove the expected operating cost of RBG is at most the total cost of the optimal 
solution, OPTjv: 


yi+l(^t+l) _ Y\x^) > y*+i(x*+i) - Y\x*+^) 

= {w\x*+^) + 0r||x*+i) - {w^-\x*+^) + 9r\\x*+^\\) 
= c*(x*+^). 


Lemma E is proven by summing up the above inequality for t = 1,..., T, since 
Y^+\x'^q\}p) and E [y^+i(x^+^)j = OPTn by E [r] = 0. 

Note that this approach also holds when the decision space P C M" for re > 1. 


.T+l) 


< 
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F Proof of Lemma [TO 


To prove Lemma fTOl we make a detour and consider a version of the problem with a discrete state 
space. We first show that on such spaces the lemma holds for a discretization of RBG, which 
we name DRBG. Next, we show that as the discretization becomes finer, the solution (and hence 
switching cost) of DRBG approaches that of RBG. The lemma is then proven by showing that the 
optimal cost of the discrete approximation approaches the optimal cost of the continuous problem. 

To begin, define a discrete variant of SOGO where the number of states is finite as follows. 
Actions can be chosen from m states, denoted by the set M = {xi,... ,Xm}, and the distances 
5 = Xj+i — Xi are the same for all i. Without loss of generality we define xi = 0. Gonsider the 
following algorithm. 

Algorithm 3 (Discrete RBG, DRBG(A)). 

Given a norm N and discrete states M = {xi,...,Xm}, define w^{x) = N{x) and w^{x) = 
(y) + c*(y) + N{x — y)} for all x E M. Generate a random number r E (—1,1). 
For each time step t, go to the state x^ which minimizes Y^{x^) = w^~^{x^) + rN{x^). 

Note that DRBG looks nearly identical to RBG except that the states are discrete. DRBG is 
introduced only for the proof and need never be implemented; thus we do not need to worry about 
the computational issues when the number of states m becomes large. 

F.l Bounding the Switching Cost of DRBG 

We now argue that the expected switching cost of DRBG (evaluated under the norm || • || and run 
with input norm N{-) = 0|| • ||) is at most the total cost of the optimal solution in the discrete 
system (under norm N). We first prove a couple of useful lemmas regarding the work function. 
The first lemma states that if the optimal way to get to some state x at time t is to come to state 
y in the previous time step, incur the operating cost at state y, and travel from state y to state x, 
then in fact the optimal way to get to state y at time t is to come to y at the previous time step 
and incur the operating cost at state y. 

Lemma 12. If3x,y : w^{x) = w^~^{y) + c*(y) + 9\\x — y\\, then w^{y) = w^~^{y) + c^{y). 

Proof. Suppose towards a contradiction that w^{y) < w^~^{y) + c*(y). Then we have: 

w\y) + 6l||x - y\\ < w^~^{y) + c\y) + 0\\x - y\\ 

= w\x) < w\y) + 6\\x - y\\ 

(since one way to get to state x at time t is to get to state y at time t and travel from y to x). This 
is a contradiction, which proves the lemma. □ 

The second lemma we show regarding the work function is as follows. 

Lemma 13. Suppose there is some state x for which w^{x) = w^~^{x) +c*(x). If c^{z) > c*(x) for 
all z > X, then we have w^{z) > w^~^{z) + c*(x) for all z > x (similarly, if c^{z) > c*(x) for all 
z < X, then we have w^{z) > w*~^{z) + c*(x) for all z < x). 

Proof. Assume without loss of generality that the entries in the cost vector satisfy c^{z) > c*(x) 
for all z > x. Let z be any state such that z > x, and assume towards a contradiction that 
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w^{z) < ^{z) + c*(x). The optimal way to get to 2 ; at time step t, w^{z), must go through some 

point j in the previous time step and incur the operating cost at j. If j > x, then we know 

+ c\x) + d\\z - j\\ < w^-\j) + c\j) + e\\z - j\\ =w\z) 

< w*'~^{z) + c*(x) < + 6 \\z — j\\ + c*(x), 

which cannot happen. On the other hand, by Lemma [121 if J < then we have 

w*{x) + 6 \\z — x|| < w^{j) + 6*||l|||x — j\ + 0||1|||2; — x\ 

= w\j) + 9\\z- j\\ = + c\j) + 9\\z- j\\ 

= w^{z) < w*'~^{z) + c*(x) < w*'~^{x) + 9\\z — x\\ + C^{x), 

which cannot happen either. □ 

We now argue that, assuming the cost vectors are of a particular form, the algorithm can 
only move from one state to another state (which are independent of the randomness r). More 
specifically, at any particular time step, if the algorithm does ever move, it always moves from a 
unique state x and it always moves to a unique state y {x and y are independent of the randomness 
r and hence remain the same for all values of r that cause the algorithm to move). 

Lemma 14. Fix any time step t, and assume the entries in the cost vector c* are either 0 or e in 
each coordinate (for a sufficiently small e > Oj, and are either non-increasing or non-decreasing. 
There exist states x, y such that, for any r, we have the following guarantee. At time t, we only 
have the following two possibilities for this value of r: 

1. The algorithm does not move. 

2. The algorithm moves from state x to state y. 

Proof. Fix a time step t, and assume without loss of generality the cost vector c* = (0,..., 0, e,..., e) 
(the case when the entries are non-increasing is symmetric). Let A = {j : = y*(j)} and let 

B = {j ■. = T*(j) + e}. That is, A is the set of states j such that the values T*(j) do not 

increase (and in particular, the work function values also do not increase), and B is the set 

of states j such that the values Y^{j) do increase. Note that sets A and B define a partition of the 
set of all states and are independent of r, since only an increase in the work function value at a state 
j can cause an increase in Y^{j) (note that the work function value is independent of r). Moreover, 
by Lemma fT^ we know that sets A and B have the form A = {1 ,... ,i},B = {i -t- 1,..., m} for 
some i. If set B is empty, then the algorithm never moves at time step t since at least one state’s 
work function value must increase for the algorithm to move (this is true for all r). Moreover, if 
set A is empty, then the algorithm also cannot move at time step t since at least one state’s work 
function value must not increase (this is true for all r). Hence, we assume that both sets A and B 
are non-empty, and moreover we assume this for all values of r. 

Now, fix a value for r, and consider the values Y^{j) for all states j (the shape of T* may be 
somewhat arbitrary). It is useful to understand how various values of r affect the shape of T*. As 
we increase the value for r, the value of Y^{j) certainly increases for all states, but states j which 
are farther to the right have the property that Y^(j) increases at a faster rate (and hence, states 
which are farther to the left have a slower rate of increase). Moreover, as we decrease the value for 
r, the value of T*(j) decreases for all states j, but states j which are farther to the right have the 
property that Y^{j) decreases at a faster rate (similarly, states farther to the left have a slower rate 
of decrease). These properties hold due to the fact that the dependence on r for T*(j) appears in 
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the term r • N(j), and hence as we change r, larger values of j have a larger impact on the value 
of Y^{j) (since N{j) is larger for such states j). 

With these observations in mind, we again take r to be any fixed value, and we also define 
ttr = argmiuj^yi y*(j), br = argminj^B y*(j) (recall that we assume A and B are non-empty for 
all values of r). Note that and br may depend on the particular value of r, and moreover we 
always have ar < br for all values of r (since states in set B are farther to the right). In particular, 
the algorithm can only move from a state in S to a state in A. In addition, the global minimum 
value of y* is precisely nim{Y^{ar),Y^{br)}. It is useful to note that, as we increase r, and br 
may decrease (i.e., the minimum state in A may move left and the minimum state in B may move 
left), while decreasing r can cause and br to increase. 

Suppose that for every r we have Y^{ar) / Y^{br). This implies that either Y*{ar) < Y^{br) for 
all r or Y^{ar) > Y^{hr) for all r. In other words, it is impossible for there to exist ri,r 2 such that 
y*(ari) < y*( 6 ri) while Y^(ar 2 ) > Y^{br 2 )- If such a scenario existed, it would imply that there 
exists some value r' such that Y^{ar/) = Y^{br/) (i.e., a crossover point) due to continuity. Hence, 
in the case that Y^(ar) 7 ^ Y^{br) for all r, we must have that either Y^{ar) < Y^{br) for every r, 
or we must have Y^{ar) > Y^{br) for every r. In either case, it is impossible for the algorithm to 
move (i.e., for all values of r, the algorithm does not move). To see why, consider the case when 
Y^{ar) < Y^{br) for every r. This means that the state which achieves the global minimum of Y^ 
(i.e., min{y*(a,.), y*( 6 ,.)}) lies in A for every value of r, and since the algorithm never moves from 
a state in A after receiving the cost vector c*, the hrst case is done. A similar argument can be 
made in the second case where for all values of r we have Y^{ar) > Y^{br). In particular, although 
the global minimum lies in the set B for every r, in the second case we know that for every r we 
have Y^{br) < Y^{ar) < Y^{j) for every j £ A (we can assume e is small enough that the new global 
minimum after c* arrives still remains in H). 

Hence, we assume there exists an r such that Y^{ar) = Y^{br)- We dehne the state y to be a^, 
and the state x to be br- Note that x and y are unique. If there are ties for the minimum T* value 
at the crossover point in set B, we take x to be the rightmost such point since states to the left in 
B cannot be the minimum for smaller values of r (ties in set A can be dealt with similarly when 
defining y). Although Ur and br depend on r, we can claim uniqueness due to the fact that br > ar 
and hence Y^{br) increases at a faster rate than Y^{ar) as we increase r and Y^{br) decreases at a 
faster rate than Y^{ar) as we decrease r. Hence, let r* denote the unique value at which Y^{ar*) 
and Y^{br*) meet. Now that x and y are defined, let us see why the lemma holds. 

Observe that the only values of r for which the algorithm can move are precisely those when the 
algorithm is currently at the minimum state in set B, namely br, and the values Y^{ar) > Y^{br) 
are really close together (in particular, increasing Y^{br) by e causes Y^^^{ar) < Y^~^^{br) for a 
carefully chosen, sufficiently small e). Consider the value r*, so that Y^{x) is the minimum value 
in set B and Y^{y) is the minimum value in set A. Observe that for all larger values r' > r*, the 
algorithm does not move since the global minimum is in set A for such values r'. 

Now, consider a slightly smaller value r < r*, which is sufficiently close to r* so that Y^{y) is 
still the minimum value in set A and Y^{x) is still the minimum value in set B, namely y = af and 
x = bf (if r is chosen too small, it is possible that x and y do not satisfy these properties). Choose 
e to be sufficiently small so that, Y^~^^{af) = Y^{bf) + e. Now, for all values r' < f, the algorithm 
cannot possibly move from any state, since the gap between Y^{ar') and Y^ihr') is larger than the 
gap between Y^{ar) and Y^ihf), and e is sufficiently small that the global minimum remains in B 
after the cost vector arrives. Now, consider values r' such that r* > r' > r. It is not possible for 
another state j £ B to become the minimum state in B for this range, by definition of how we chose 
f and by definition of x (similar reasoning shows that no other state j £ A can be the minimum 
state in A for this range). Hence, every time the algorithm moves, it goes from state x to state y. 
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In particular, x remains the minimum state in set B and y remains the minimum state in set A for 
this range, and the algorithm moves from state x to state y for all r' in this range. □ 

We now prove the main lemma. Let SC^ = II®* ~ denote the total switching cost 

incurred by DRBG up until time t, and define the potential function (j)^ = ^{w^{xi) + w^{xm)) — 
||3:m-a:i|| ^ show the following lemma. 

Lemma 15. For every time step t, E [SC*] < cj)^. 

Proof. We prove this lemma by induction on t. At time t = 0, clearly it is true since the left hand 
side E [5(7*^] = 0, while the right hand side (j)^ = ^{w^{xi) + w^{xm)) — ~ + ^||a^m — 

xill) — ^^^^"2 = 0- We now argue that at each time step, the increase in the left hand side is at 

most the increase in the right hand side. 

Since the operating cost is convex, it is non-increasing until some point Xmin and then non¬ 
decreasing over the set M. We can imagine our cost vector arriving in e-sized increments as follows. 
We imagine sorting the cost values so that c^{ii) < £*(^ 2 ) < • • • < c^{im), and then view time step 
t as a series of smaller time steps where we apply a cost of e to all states for the first c*(ii)/e time 
steps, followed by applying a cost of e to all states except state R for the next { 0 ^( 12 ) — c*(*i))/e time 
steps (if each such cost vector’s entries strictly decrease at some point and then strictly increase 
at some point, we split the vector into two vectors which add up to the original, one of which is 
non-increasing and the other of which is non-decreasing), etc., where e has a very small value. If 
adding these e-sized cost vectors would cause us to exceed the original cost c*(ifc) for some k, then 
we just use the residual e^ < e in the last round in which state ik has non-zero cost. Eventually, 
these e-sized cost vectors add up precisely to the original cost vector cL Under these new cost 
vectors, the algorithm’s switching cost does not get better (and the optimal solution does not get 
worse). If the left hand side does not increase at all from time step t— 1 to t, then the lemma holds 
(since the right hand side can only increase). 

Our expected switching cost is the probability that the algorithm moves multiplied by the 
distance moved. Suppose the algorithm is currently in state x. Observe that, by Lemma [HI there 
is only one state the algorithm could be moving from (state x) and only one state y the algorithm 
could be moving to, both of which do not depend on the randomness r (we can choose e to be 
sufficiently small in order to guarantee this). Moreover, we would never move to a state where the 
work function increases by e. First we consider the case x > Xmin- 

The only reason we would move from state x is if w^{x) increases from the previous time step, 
so that we have w^{x) = w^~^{x) -I- e. By Lemma [T^ we know w^{z) = -)- e for all z > x. 

Hence, we can conclude a couple of facts. The state y we move to cannot be such that y > x. 
Moreover, we also know that w^{xm) = w^~^{xm) + e (since Xm > x). Notice that for us to move 
from state x to state y, the random value r must fall within a very specific range. In particular, 
we must have Y^{x) < Y^{y) and Y^~^^{y) < Y^^^{x): 

{w^~^{x) + 0r||I||x < w^~^{y) + 0r||l||y) 

A {vf{y) + 0r||l||y < w^{x) + 0r||l||x) 

=^Vif~^{y) — wf~^{x) — e < Vif{y) — w^{x) 

< 6 r\\x — y\\ < w^~^{y) — w^~^{x). 

This means r must fall within an interval of length at most e/0||x — y||. Since r is chosen from 
an interval of length 2, this happens with probability at most e/(20||x — y\\). Hence, the increase 
in our expected switching cost is at most ||x — y|| • e/(20||x — y||) = e/26. On the other hand, the 
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increase in the right hand side is + w^{xm) — w^~^{xm)) > e/20 

(since w^{xm) = w^~^{xm) + e). The case when x < Xmin is symmetric. This finishes the inductive 
claim. □ 

Now we prove the expected switching cost of DRBG is at most the total cost of the optimal 
solution for the discrete problem. 

By Lemma m for all times t we have E [SC*] < Denote by OPT^ the optimal solution at 
time t (so that OPT^ = mhixW^{x) and OPT^ = OPT]\f). Let x* = arg min^; tc*(x) be the final 
state which realizes OPT^ at time t. We have, for all times t: 

E [5(7*] <0* = ^{w\xi) + w\xm)) - ^ 

+ ^11^* - ^ill + + ^11^- -x*\\)- = ^OPT\ 

In particular, the equation holds at time T, which gives the bound. 

F.2 Convergence of DRBG to RBG 

In this section, we are going to show that if we keep splitting the state spacing 5, then the output 
of DRBG, which is denoted by converges to the output of RBG, which is denoted by Xq. 

Lemma 16. Consider a SOCO with F = [xl,xh\- Consider a sequence of discrete systems such 
that the state spacing <5 —>■ 0 and for each system, [xi,Xm] = F- Let Xi denote the output of DRBC 
in the i*** discrete system, and x denote the output of RBC in the continuous system. Then the 
sequence {xi} converges to x with probability 1 as i increases, i.e., for all t, limj_j.oo |x( — x*| = 0 
with probability 1. 

Proof. To prove the lemma, we just need to show that Xi converges pointwise to x with probability 1. 
For a given 6 , let denote the function T* used by DRBG in the discrete system (with feasible 
set M = {xi,... ,Xm} C F) and T/. denote the function T* used by RBG in the continuous 
system (with feasible set F). The output of DRBG and RBG at time t are denoted by and 
x\j respectively. The subsequence on which |x^ — x^| < 25 clearly has converge to x\j. Now 
consider the subsequence on which this does not hold. For each such system, we can find an 
xf(j G {xi,..., Xm\ satisfying |x^ — x\j\ < 5 (and thus |xp — x^| > 5) such that Yq{xq) < T/.(x^), 
by the convexit 3 o of T/.. Moreover, Y/,(x^) < y/,(x/;) and T/.(x^) < y/,(x^). So far, we have only 
rounded the component. Now let us consider a scheme that rounds to the set M all components 
T < t of a solution to the continuous problem. 

For an arbitrary trajectory x = (x*)^]^, define a sequence xr{x) with x]^ G {xi,...,Xm} as 
follows. Let I = max{fe : x^ < x"^}. Set x^ to x; if c'^{xi) < c'^(x;_|_i) or / = m, and x^+i otherwise. 
This rounding increases the switching cost by at most 20||(I|| for each timeslot. If Z = m then the 
operating cost is unchanged. Next, we bound the increase in operating cost when I < m. 

For each timeslot r, depending on the shape of on (x;,x;+i), we may have two cases: (i) o’" 
is monotone; (ii) c’’ is non-monotone. In case (i), the rounding does not make the operating cost 
increase for this timeslot. Note that if xf. G {xl,xh} then for all sufficiently small 5, case (ii) 
cannot occur, since the location of the minimum of c’^ is independent of <5. We now consider case 
(ii) with xf, G (xl,xh). Note that there must be a finite left and right derivative of c'^ at all points 
in (xL,Xff) for to be finite on F. Hence, these derivatives must be bounded on any compact 

®The minimum of a convex function over a convex set is convex. Thus, by definition, w* is a convex function by 
induction, and hence Yf is convex as well. 
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subset of {xl,xh)- Since E {xl,xh), there exists a set [x'j^,x'jj] C {xl,xh) independent of 6 
such that, for sufficiently small 6, we have C [xi,x'fj]. Hence, there exists an H'^ such 

that, for sufficiently small 6 , the gradient of is bounded by H'^ on [x^, x;+i]. Thus, for sufficiently 
small 6 , the rounding makes the operating cost increase by at most H '^6 in timeslot r. 

Define H = maxT-{ff'^}. If we apply this scheme to the trajectory which achieves Y^(xq), we get 
a decision sequence in the discrete system with cost+r0||x^|| not more than Y^{x(^) + {H5+26\\5\\)t 
(by the foregoing bound on the increase in costs) and not less than y^(x(^) (because the solution 
of Y^(xp) minimizes cost + r0||x^||). Specifically, we have Y^(x^) < YIj{x\j) + {H5 + 20||5||)f. 
Therefore, 


^ci^c) — ^ ^ — ^ci^c) + + 26*p||)t- 

Notice that the gradient bound H is independent of 5, and so {H5 + 20||(5||)t —)• 0 as 5 —)• 0. 
Therefore, \Yq{xI) — Yq(xq)\ converges to 0 as f increases. 

Independent of the random choice r, the domain of vj^(-) can be divided into countably many 
non-singleton intervals on which Wq{-) is affine, joined by intervals on which it is strictly convex. 
Then Y^(-) has a unique minimum unless — r is equal to the slope of one of the former intervals, 
since Yq(-) is convex. Hence, it has a unique minimum with probability one with respect to the 
choice of r. 

Hence, with probability one, Xq is the unique minimum of Y^. To see that Y^{-) is continuous 
at any point a, apply the squeeze principle to the inequality w\j{a) < w\j{x) + 0||x — a\\ < w\^{a) + 
20||x—a||, and note that Y^{-) is w^{-) plus a continuous function. The convergence of \x\j—x\^\ then 
implies \Y^{x^(j) — y)(,(xp)| —>• 0 and thus \Yq{x\) — Y^{xq)\ 0, or equivalently Yq{x\) Yq{xq). 

Note that the restriction of Y^ to [xl,x*^] has a well-defined inverse Y~^, which is continuous 
at Yq{xq). Hence, for the subsequence of x( such that x( < x^, we have x( = Y~^(Y^(xj)) —)• 
Y~^(Yq(x^(^)) = x\j. Similarly, the subsequence such that x\ > x\j also converges to Xq. □ 

F.3 Convergence of OPT in Discrete Systems 

To show that the competitive ratio holds for RBG, we also need to show that the optimal costs 
converge to those of the continuous system. 

Lemma 17. Consider a SOCO problem with F = [xl,xh]- Consider a sequence of discrete 
systems such that the state spacing <5 —>■ 0 and for each system, [xi,Xm] = F. Let OPTf) denote 
the optimal cost in the discrete system, and OPTc denote the optimal cost in the continuous 
system (both under the norm N). Then the sequence \OPTff\ converqes to OPTc as i increases, 
i.e., lim^^oo \OPT}, - OPTc\ = 0. 

Proof. We can apply the same rounding scheme in the proof of Lemma [16] to the solution vector 
of OPTc to get a discrete output with total cost bounded by OPTc + {H 6 + 20||5||)r, thus 

OPTf) < OPTc + {H6 + 29\\S\\)T. 

Notice that the gradient bound H is independent of 5 and so {H5 + 20||h||)T —)• 0 as <5 —)• 0. 
Therefore, OPTf^ converges to OPTc as i increases. □ 
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